Llama3 hybrid implementation using submeshes #18777

ipotkonjak-tt · 2025-03-07T13:06:03Z

Problem description

Missing support for data / hybrid parallelism for Llama3 models.

What's changed

Addition of hybrid parallelism within llama code base with concept of submeshes. Implementation is mainly based at the LlamaGenerator level. MeshDevice is partitioned into submeshes where each subset of devices has an independent model. Models remain implemented in the tensor parallel manner.

Checklist

All post commit CI passes
Model regression CI passes
Device performance regression CI passes
New/Existing tests provide coverage for changes - added test cases to demo tests

…/DP group

yieldthought

Clean 👌

To do:

Add at least one CI test that will exercise DP. I suggest adding a demo to the t3k tests.

models/demos/llama3/tt/model_config.py

models/demos/llama3/demo/simple_text_demo.py

skhorasganiTT · 2025-03-10T00:46:11Z

models/demos/llama3/demo/simple_text_demo.py

+    if is_ci_env and num_devices == 8 and data_parallel > 1 and not ("3.2-1B" in llama_dir or "3.1-8B" in llama_dir):
+        pytest.skip("CI runs only hybrid Llama3 1b and 8b on T3K")


What about 3B?

Wanted to avoid burdening the CI with additional tests. 1B and 8B seemed okay to cover perf regression checks as the smallest and biggest variants of the smaller Llama3 models. Should we add 3B anyway?

models/demos/llama3/demo/simple_text_demo.py

models/demos/llama3/tt/generator_vllm.py

skhorasganiTT · 2025-03-10T01:44:29Z

models/demos/llama3/tt/generator_vllm.py

+    return data_parallel, mesh_device.create_submeshes(ttnn.MeshShape(1, num_devices // data_parallel))
+
+
+def allocate_kv_cache(kv_cache_shape, dtype, num_layers, mesh_device):


TODO (@ipotkonjak-tt and/or @skhorasganiTT) Modify KV creation in vLLM to use this function and test with DP

ipotkonjak-tt added 14 commits March 7, 2025 10:35

initial hybrid impl using submeshes

0bb92df

load state_dict only once

3dadf7d

code clean-up, bug fix: prefill data in DRAM, correct user distribution

e8ced18

vLLM integration: model, model_args, page_table

f0b7794

vLLM integration: kv_cache proposal

ef4fdd0

vLLM inegration: kv_cache; bug fix: avoid submesh in case of 1 device…

be91311

…/DP group

expose allocate_kv_cache

2f0780d

hybrid llama vision

cbe7ce6

Test only 70b 4xT3K on TG and 1b/8b 8xN150, 4xN300 on T3K

2856a62

restore test cases for simple_vision_demo.py

39b9de7

test_llama_accuracy.py bug fix

e6a084e

test_llama_chunked_generation.py bug fix

4676286

hang batch32 fix

e8e6ddf

restore usage of argmax_on_device arg

7321390

ipotkonjak-tt requested review from mbezuljTT and skhorasganiTT March 7, 2025 13:06

yieldthought requested changes Mar 7, 2025

View reviewed changes

models/demos/llama3/tt/model_config.py Outdated Show resolved Hide resolved

ipotkonjak-tt requested a review from cfjchu March 7, 2025 14:34

remove unused code

cbb4a66

cfjchu approved these changes Mar 8, 2025

View reviewed changes

ipotkonjak-tt self-assigned this Mar 8, 2025

ipotkonjak-tt added the llama3 label Mar 8, 2025

skhorasganiTT reviewed Mar 10, 2025

View reviewed changes

ipotkonjak-tt added 2 commits March 10, 2025 09:09

PR: requested changes

e9ddfb4

added CI tests

eac8314

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama3 hybrid implementation using submeshes #18777

Llama3 hybrid implementation using submeshes #18777

ipotkonjak-tt commented Mar 7, 2025 •

edited

Loading

yieldthought left a comment

skhorasganiTT Mar 10, 2025

ipotkonjak-tt Mar 10, 2025

skhorasganiTT Mar 10, 2025

		if is_ci_env and num_devices == 8 and data_parallel > 1 and not ("3.2-1B" in llama_dir or "3.1-8B" in llama_dir):
		pytest.skip("CI runs only hybrid Llama3 1b and 8b on T3K")

		return data_parallel, mesh_device.create_submeshes(ttnn.MeshShape(1, num_devices // data_parallel))


		def allocate_kv_cache(kv_cache_shape, dtype, num_layers, mesh_device):

Llama3 hybrid implementation using submeshes #18777

Are you sure you want to change the base?

Llama3 hybrid implementation using submeshes #18777

Conversation

ipotkonjak-tt commented Mar 7, 2025 • edited Loading

Problem description

What's changed

Checklist

yieldthought left a comment

Choose a reason for hiding this comment

skhorasganiTT Mar 10, 2025

Choose a reason for hiding this comment

ipotkonjak-tt Mar 10, 2025

Choose a reason for hiding this comment

skhorasganiTT Mar 10, 2025

Choose a reason for hiding this comment

ipotkonjak-tt commented Mar 7, 2025 •

edited

Loading